16 research outputs found

    A mathematical programming approach to SVM-based classification with label noise

    Get PDF
    The authors of this research acknowledge financial support by the Spanish Ministerio de Ciencia y Tecnologia, Agencia Estatal de Investigacion and Fondos Europeos de Desarrollo Regional (FEDER) via project PID2020114594GB-C21. The authors also acknowledge partial support from projects FEDER-US-1256951, Junta de Andaluc铆a P18-FR-1422, CEI-3-FQM331, NetmeetData: Ayudas Fundaci贸n BBVA a equipos de investigaci贸n cient铆fica 2019. The first author was also supported by projects P18-FR-2369 (Junta de Andaluc铆a) and IMAG-Maria de Maeztu grant CEX2020-001105-M /AEI /10.13039/501100011033. (Spanish Ministerio de Ciencia y Tecnologia).In this paper we propose novel methodologies to optimally construct Support Vector Machine-based classifiers that take into account that label noise occur in the training sample. We propose different alternatives based on solving Mixed Integer Linear and Non Linear models by incorporating decisions on relabeling some of the observations in the training dataset. The first method incorporates relabeling directly in the SVM model while a second family of methods combines clustering with classification at the same time, giving rise to a model that applies simultaneously similarity measures and SVM. Extensive computational experiments are reported based on a battery of standard datasets taken from UCI Machine Learning repository, showing the effectiveness of the proposed approaches.Spanish Ministerio de Ciencia y Tecnologia, Agencia Estatal de Investigacion and Fondos Europeos de Desarrollo Regional (FEDER) via project PID2020114594GB-C21FEDER-US-1256951Junta de Andaluc铆a P18-FR-1422CEI-3-FQM331NetmeetData: Ayudas Fundaci贸n BBVA a equipos de investigaci贸n cient铆fica 2019Project P18-FR-2369 Junta de Andaluc铆aIMAG-Maria de Maeztu grant CEX2020-001105-M /AEI /10.13039/501100011033. (Spanish Ministerio de Ciencia y Tecnologia

    Multiclass optimal classification trees with SVM鈥憇plits

    Get PDF
    In this paper we present a novel mathematical optimization-based methodology to construct tree-shaped classification rules for multiclass instances. Our approach consists of building Classification Trees in which, except for the leaf nodes, the labels are temporarily left out and grouped into two classes by means of a SVM separating hyperplane. We provide a Mixed Integer Non Linear Programming formulation for the problem and report the results of an extended battery of computational experiments to assess the performance of our proposal with respect to other benchmarking classification methods.Universidad de Sevilla/CBUASpanish Ministerio de Ciencia y Tecnolog铆a, Agencia Estatal de Investigaci贸n, and Fondos Europeos de Desarrollo Regional (FEDER) via project PID2020-114594GB-C21Junta de Andaluc铆a projects FEDER-US-1256951, P18-FR-1422, CEI-3-FQM331, B-FQM-322-UGR20AT 21_00032; Fundaci贸n BBVA through project NetmeetData: Big Data 2019UE-NextGenerationEU (ayudas de movilidad para la recualificaci贸n del profesorado universitario)IMAG-Maria de Maeztu grant CEX2020- 001105-M /AEI /10.13039/50110001103

    Juegos de evasi贸n

    Get PDF
    The goals of the work are twofold. The first one is about the background of mathematics of pursuit and evasion, models that are known as Chases and Escapes games. Five problems are solved using clasic techniques from Mathematic Analysis, and a sixth one of a different nature where its solution is described from a probabilistic point of view. The second one, of divulgative character and realized in colaboration with an Architecture student, consists in the design of a real park where the problems mentioned before can be brought into practise. The park, which is supposed to be placed in Seville, becomes as an idea to bring closer the world of mathematics to people from the city throughout a pleasant experience, showing that mathematics attached to a logical reasoning are a powerfull tool.Universidad de Sevilla. Grado en Matem谩tica

    On the multisource hyperplanes location problem to fitting set of points

    Get PDF
    In this paper we study the problem of locating a given number of hyperplanes minimizing an objective function of the closest distances from a set of points. We propose a general framework for the problem in which norm-based distances between points and hyperplanes are aggregated by means of ordered median functions. A compact Mixed Integer Linear (or Non Linear) programming formulation is presented for the problem and also an extended set partitioning formulation with an exponential number of variables is derived. We develop a column generation procedure embedded within a branch-and-price algorithm for solving the problem by adequately performing its preprocessing, pricing and branching. We also analyze geometrically the optimal solutions of the problem, deriving properties which are exploited to generate initial solutions for the proposed algorithms. Finally, the results of an extensive computational experience are reported. The issue of scalability is also addressed showing theoretical upper bounds on the errors assumed by replacing the original datasets by aggregated versions.Comment: 30 pages, 5 Tables, 3 Figure

    A Mathematical Programming Approach to Optimal Classification Forests

    Full text link
    In this paper, we introduce Optimal Classification Forests, a new family of classifiers that takes advantage of an optimal ensemble of decision trees to derive accurate and interpretable classifiers. We propose a novel mathematical optimization-based methodology in which a given number of trees are simultaneously constructed, each of them providing a predicted class for the observations in the feature space. The classification rule is derived by assigning to each observation its most frequently predicted class among the trees in the forest. We provide a mixed integer linear programming formulation for the problem. We report the results of our computational experiments, from which we conclude that our proposed method has equal or superior performance compared with state-of-the-art tree-based classification methods. More importantly, it achieves high prediction accuracy with, for example, orders of magnitude fewer trees than random forests. We also present three real-world case studies showing that our methodology has very interesting implications in terms of interpretability.Comment: 24 pages, 9 figures, 1 tabl

    Robust optimal classification trees under noisy labels

    Get PDF
    This research has been partially supported by Spanish Ministerio de Ciencia e Innovacion, Agencia Estatal de Investigacion/FEDER grant number PID2020-114594GBC21, Junta de Andalucia projects P18-FR-1422, P18-FR-2369 and projects FEDERUS-1256951, BFQM-322-UGR20, CEI-3-FQM331 and NetmeetData-Ayudas Fundacion BBVA a equipos de investigacion cientifica 2019. The first author was also partially supported by the IMAG-Maria de Maeztu grant CEX2020-001105-M /AEI /10.13039/501100011033.In this paper we propose a novel methodology to construct Optimal Classification Trees that takes into account that noisy labels may occur in the training sample. The motivation of this new methodology is based on the superaditive effect of combining together margin based classifiers and outlier detection techniques. Our approach rests on two main elements: (1) the splitting rules for the classification trees are designed to maximize the separation margin between classes applying the paradigm of SVM; and (2) some of the labels of the training sample are allowed to be changed during the construction of the tree trying to detect the label noise. Both features are considered and integrated together to design the resulting Optimal Classification Tree.We present a Mixed Integer Non Linear Programming formulation for the problem, suitable to be solved using any of the available off-the-shelf solvers. The model is analyzed and tested on a battery of standard datasets taken from UCI Machine Learning repository, showing the effectiveness of our approach. Our computational results show that in most cases the new methodology outperforms both in accuracy and AUC the results of the benchmarks provided by OCT and OCT-H.Spanish Ministerio de Ciencia e Innovacion, Agencia Estatal de Investigacion/FEDER PID2020-114594GBC21Junta de Andalucia P18-FR-1422 P18-FR-2369NetmeetData-Ayudas Fundacion BBVA a equipos de investigacion cientifica 2019IMAG-Maria de Maeztu CEX2020-001105-M /AEI /10.13039/501100011033FEDERUS-1256951 BFQM-322-UGR20 CEI-3-FQM33

    New Advances In Data Science Problems Through Hyperplanes Location

    No full text
    This thesis dissertation focus on developing new approaches for different Data Science problems from a Location Theory perspective. In particular, we concentrate on locating hyperplanes by means of solving Mixed Integer Linear and Non Linear Problems. Chapter 1 introduces the baseline techniques involved in this work, which encompass Support Vector Machines, Decision Trees and Fitting Hyperplanes Theory. In Chapter 2 we study the problem of locating a set of hyperplanes for multiclass classification problems, extending the binary Support Vector Machines paradigm. We present four Mathematical Programming formulations which allow us to vary the error measures involved in the problems as well as the norms used to measure distances. We report an extensive battery of computational experiment over real and synthetic datasets which reveal the powerfulness of our approach. Moreover, we prove that the kernel trick can be applicable in our method. Chapter 3 also focus on locating a set of hyperplanes, in this case, aiming to minimize an objective function of the closest distances from a set of points. The problem is treated in a general framework in which norm-based distances between points and hyperplanes are aggregated by means of ordered median functions. We present a compact formulation and also a set partitioning one. A column generation procedure is developed in order to solve the set partitioning problem. We report the results of an extensive computational experience, as well as theoretical results over the scalability issues and geometrical analysis of the optimal solutions. Chapter 4 addresses the problem of finding a separating hyperplane for binary classification problems in which label noise is considered to occur over the training sample. We derive three methodologies, two of them based on clustering techniques, which incorporate the ability of relabeling observations, i.e., treating them as if they belong to their contrary class, during the training process. We report computational experiments that show how our methodologies obtain higher accuracies when training samples contain label noise. Chapters 5 and 6 consider the problem of locating a set of hyperplanes, following the Support Vector Machines classification principles, in the context of Classification Trees. The methodologies developed in both chapters inherit properties from Chapter 4, which play an important role in the problems formulations. On the one hand, Chapter 5 focuses on binary classification problems where label noise can occur in training samples. On the other hand, Chapter 6 focus on solving the multiclass classification problem. Both chapters present the results of our computational experiments which show how the methodologies derived outperform other Classification Trees methodologies. Finally, Chapter 7 presents the conclusions of this thesis
    corecore